Challenges and experiences in collecting a chat corpus
نویسندگان
چکیده
Present day access to a wealth of electronically available linguistic data creates enormous opportunities for cutting edge research questions and analyses. Computer-mediated communication (CMC) data are specifically interesting, for example because the multimodal character of new media puts our ideas about discourse issues like coherence to the test. At the same time CMC data are ephemeral, because of rapid changing technology. That is why we urgently need to collect CMC discourse data before the technology becomes obsolete. This paper describes a number of challenges we encountered when collecting a chat corpus with data from secondary school children in Amsterdam. These challenges are various in nature: logistic, ethical and technological.
منابع مشابه
Collecting Natural SMS and Chat Conversations in Multiple Languages: The BOLT Phase 2 Corpus
The DARPA BOLT Program develops systems capable of allowing English speakers to retrieve and understand information from informal foreign language sources. Phase 2 of the program required large volumes of naturally occurring informal text (SMS) and chat messages from individual users in multiple languages to support evaluation of machine translation systems. We describe the design and implement...
متن کاملInternational Student Mobility Program (ISMP) Analysis of International Students' Challenges in Iran
In the recent decades, Internalization has been one of the most influential events in the higher education systems worldwide. And recruitment of international students, as one of the most effective strategies to achieve the goals of internationalization, has been a common solution to prevail in the competition between countries in the last decade. Therefore, studying these international stude...
متن کاملSyntactic parsing of chat language in contact center conversation corpus
Chat language is often referred to as Computer-mediated communication (CMC). Most of the previous studies on chat language has been dedicated to collecting ”chat room” data as it is the kind of data which is the most accessible on the WEB. This kind of data falls under the informal register whereas we are interested in this paper in understanding the mechanisms of a more formal kind of CMC: dia...
متن کاملHealthcare Priority-Setting: Chat-Ting Is Not Enough; Comment on “Swiss-CHAT: Citizens Discuss Priorities for Swiss Health Insurance Coverage”
CHAT has its limits. It is a three-hour exercise. However, the real world problems of healthcare rationing and priority-setting are too complex for a three-hour exercise. What is needed, as a supplement, are sustained processes of rational democratic deliberation that can address the challenges to healthcare justice posed by costly emerging medical technologies, such as these targeted cancer th...
متن کاملMPC: A Multi-Party Chat Corpus for Modeling Social Phenomena in Discourse
In this paper, we describe our experience with collecting and creating an annotated corpus of multi-party online conversations in a chat-room environment. This effort is part of a larger project to develop computational models of social phenomena such as agenda control, influence, and leadership in on-line interactions. Such models will help capturing the dialogue dynamics that are essential fo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JLCL
دوره 29 شماره
صفحات -
تاریخ انتشار 2014